feat: Implement ZIP64 support in pzip for large file handling by dengzhongyuan365-dev · Pull Request #345 · linuxdeepin/deepin-compressor

dengzhongyuan365-dev · 2026-01-15T12:37:24Z

feat: Implement ZIP64 support in pzip for large file handling

Added constants for ZIP64 thresholds and version requirements.
Enhanced ZipFileHeader with isZip64() method to determine ZIP64 status.
Updated writeLocalFileHeader and writeDataDescriptor methods to handle ZIP64 sizes.
Modified writeCentralDirectory and writeEndOfCentralDirectory to accommodate ZIP64 records and offsets.
Introduced ZIP64 extra fields for central directory entries and end of central directory records.

Log: Improve ZIP file handling for large files with ZIP64 support

Summary by Sourcery

Add ZIP64 support to the pzip writer to correctly handle large files and central directories beyond classic ZIP size limits.

New Features:

Support ZIP64 local file headers, data descriptors, central directory records, and end-of-central-directory records for large ZIP archives.

Enhancements:

Introduce size/version constants and an isZip64() helper on ZipFileHeader to drive ZIP64-aware header formatting.
Add ZIP64 extra fields and locators when entry sizes or offsets exceed 32-bit limits.
Adjust deflate output flushing to use a buffered threshold during normal operation and a separate forced flush on close to reduce system call overhead.

- Introduced forceFlush method to ensure all buffered data is output immediately. - Updated flushOutput method to only flush when the buffer exceeds 256KB, optimizing system calls. - Modified storeFast and close methods to utilize forceFlush for final data output. Log: Enhance data flushing mechanisms in FlateWriter

- Added constants for ZIP64 thresholds and version requirements. - Enhanced ZipFileHeader with isZip64() method to determine ZIP64 status. - Updated writeLocalFileHeader and writeDataDescriptor methods to handle ZIP64 sizes. - Modified writeCentralDirectory and writeEndOfCentralDirectory to accommodate ZIP64 records and offsets. - Introduced ZIP64 extra fields for central directory entries and end of central directory records. Log: Improve ZIP file handling for large files with ZIP64 support

sourcery-ai · 2026-01-15T12:38:02Z

Reviewer's Guide

Adds ZIP64 support to the pzip ZipWriter so it can handle large files and large central directories, and adjusts the deflate writer buffering to flush less often while still guaranteeing all data is flushed on close.

Sequence diagram for ZipWriter close with ZIP64 central directory and EOCD handling

sequenceDiagram
    actor Client
    participant ZipWriter
    participant File as file_

    Client->>ZipWriter: close()
    activate ZipWriter

    ZipWriter->>ZipWriter: centralDirOffset = currentOffset_
    ZipWriter->>ZipWriter: writeCentralDirectory()
    activate ZipWriter
    loop for each entry in centralDir_
        ZipWriter->>ZipWriter: needZip64 = header.isZip64() || localHeaderOffset >= ZIP_UINT32_MAX
        alt needZip64
            ZipWriter->>ZipWriter: write central dir header with ZIP_UINT32_MAX sizes
            ZipWriter->>ZipWriter: build zip64Extra (sizes, offset)
        else not needZip64
            ZipWriter->>ZipWriter: write central dir header with 32-bit sizes
        end
        ZipWriter->>File: write fixed central dir fields
        ZipWriter->>File: write filename
        ZipWriter->>File: write original extra
        alt needZip64
            ZipWriter->>File: write zip64Extra
        end
        ZipWriter->>ZipWriter: update currentOffset_
    end
    deactivate ZipWriter

    ZipWriter->>ZipWriter: centralDirSize = currentOffset_ - centralDirOffset

    ZipWriter->>ZipWriter: writeEndOfCentralDirectory(centralDirOffset, centralDirSize)
    activate ZipWriter
    ZipWriter->>ZipWriter: records = centralDir_.size()
    ZipWriter->>ZipWriter: size = centralDirSize
    ZipWriter->>ZipWriter: offset = centralDirOffset

    ZipWriter->>ZipWriter: needZip64 = (records >= ZIP_UINT16_MAX || size >= ZIP_UINT32_MAX || offset >= ZIP_UINT32_MAX)
    alt needZip64
        ZipWriter->>File: write ZIP64 EOCD record
        ZipWriter->>File: write ZIP64 EOCD locator
        ZipWriter->>ZipWriter: records = ZIP_UINT16_MAX
        ZipWriter->>ZipWriter: size = ZIP_UINT32_MAX
        ZipWriter->>ZipWriter: offset = ZIP_UINT32_MAX
        ZipWriter->>ZipWriter: update currentOffset_
    end

    ZipWriter->>File: write classic EOCD with records, size, offset
    ZipWriter->>File: write comment
    deactivate ZipWriter

    ZipWriter->>File: close()
    deactivate ZipWriter
    ZipWriter-->>Client: close() result

Sequence diagram for FlateWriter output buffering and flushing

sequenceDiagram
    actor Client
    participant FlateWriter
    participant BitWriter as writer_
    participant Output as output_

    Client->>FlateWriter: fillBlock(data, size)
    activate FlateWriter
    FlateWriter->>BitWriter: write compressed tokens
    FlateWriter->>FlateWriter: flushOutput()
    activate FlateWriter
    FlateWriter->>BitWriter: buf = writer_.data()
    alt buf.size() >= 256KB and output_ set
        FlateWriter->>Output: output_(buf.data(), buf.size())
        FlateWriter->>BitWriter: buf.clear()
    else buffer below threshold
        FlateWriter->>FlateWriter: do not flush
    end
    deactivate FlateWriter
    deactivate FlateWriter

    Client->>FlateWriter: close()
    activate FlateWriter
    FlateWriter->>FlateWriter: if already closed return
    FlateWriter->>FlateWriter: finish pending compression
    FlateWriter->>BitWriter: writer_->flush()
    FlateWriter->>FlateWriter: forceFlush()
    activate FlateWriter
    FlateWriter->>BitWriter: buf = writer_.data()
    alt buf not empty and output_ set
        FlateWriter->>Output: output_(buf.data(), buf.size())
        FlateWriter->>BitWriter: buf.clear()
    else no remaining data
        FlateWriter->>FlateWriter: nothing to flush
    end
    deactivate FlateWriter
    FlateWriter->>FlateWriter: mark closed
    deactivate FlateWriter
    FlateWriter-->>Client: close() result

Class diagram for updated ZIP64-related structures and writers

classDiagram
    class ZipFileHeader {
        +string name
        +uint16_t versionMadeBy
        +uint16_t versionNeeded
        +uint16_t flags
        +uint16_t compressionMethod
        +time_t modTime
        +uint32_t crc32
        +uint64_t compressedSize
        +uint64_t uncompressedSize
        +uint16_t internalAttr
        +uint32_t externalAttr
        +bool isDirectory()
        +bool isZip64()
    }

    class ZipWriter {
        -std::ofstream file_
        -uint64_t currentOffset_
        -std::string comment_
        -std::vector~CentralDirEntry~ centralDir_
        +Error writeLocalFileHeader(ZipFileHeader header)
        +Error writeDataDescriptor(ZipFileHeader header)
        +Error writeCentralDirectory()
        +Error writeEndOfCentralDirectory(uint64_t centralDirOffset, uint64_t centralDirSize)
        +Error close()
        +static void timeToDos(time_t t, uint16_t date, uint16_t dosTime)
    }

    class FlateWriter {
        -std::unique_ptr~BitWriter~ writer_
        -std::function~void(const uint8_t* data, size_t size)~ output_
        -TokenBuffer tokens_
        -uint32_t windowEnd_
        -bool useL1_
        +size_t fillBlock(const uint8_t* data, size_t size)
        +void storeFast()
        +void flushOutput()
        +void forceFlush()
        +void close()
        +FastGen* encoder()
    }

    ZipWriter --> ZipFileHeader : uses
    FlateWriter --> ZipWriter : feeds compressed data

File-Level Changes

Change	Details	Files
Add ZIP64 capability flags and detection on ZipFileHeader to drive 32-bit vs 64-bit size handling throughout ZIP structures.	Introduce ZIP_UINT32_MAX, ZIP_UINT16_MAX and ZIP_VERSION_45 constants to represent ZIP64 thresholds and required version. Add ZipFileHeader::isZip64() helper to determine when compressed or uncompressed size exceeds 32-bit limits. Use isZip64() to select ZIP version needed values where ZIP64 is required.	`3rdparty/pzip/include/pzip/file_task.h` `3rdparty/pzip/src/zip_writer.cpp`
Update local headers and data descriptors to emit ZIP64-compatible size fields when needed.	In writeLocalFileHeader, write 0 for sizes when using data descriptors, sentinel 0xFFFFFFFF when ZIP64, or 32-bit sizes otherwise. In writeDataDescriptor, dynamically select 16-byte (32-bit) or 24-byte (ZIP64) descriptor formats and encode 64-bit sizes when header is ZIP64. Align local header and descriptor encoding with Go archive/zip conventions and signatures.	`3rdparty/pzip/src/zip_writer.cpp`
Rework central directory writing to construct entries in memory, append ZIP64 extra fields, and correctly handle 64-bit offsets and sizes.	Refactor writeCentralDirectory to use a buffer with write16/write32/write64 helpers instead of writing directly to the stream. Determine per-entry if ZIP64 is needed based on file sizes or local header offsets and set version needed to ZIP_VERSION_45 accordingly. When ZIP64 is needed, write 0xFFFFFFFF sentinels into 32-bit size/offset fields and add a ZIP64 extra field containing 64-bit uncompressed size, compressed size, and local header offset. Guard relative offset of local header with 0xFFFFFFFF when it exceeds 32-bit range and write both existing extra data and ZIP64 extra data to the file.	`3rdparty/pzip/src/zip_writer.cpp`
Implement ZIP64 End of Central Directory records and adjust close() flow to pass explicit central directory location and size.	Change writeEndOfCentralDirectory to accept centralDirOffset and centralDirSize instead of recomputing them internally. Inside writeEndOfCentralDirectory, compute record counts/size/offset and decide when to emit ZIP64 EOCD and locator records according to ZIP64 thresholds. When ZIP64 is used, write ZIP64 EOCD and locator first, then clamp record count, size, and offset in the legacy EOCD to their max 16/32-bit sentinel values. Update close() to record central directory start offset, call writeCentralDirectory, compute its size, and then invoke writeEndOfCentralDirectory with those values.	`3rdparty/pzip/src/zip_writer.cpp` `3rdparty/pzip/include/pzip/zip_writer.h`
Adjust FlateWriter output buffering to reduce syscalls while guaranteeing a final flush of all compressed data.	Change flushOutput to only flush when the internal buffer grows to at least 256KB, clearing the buffer after flushing. Introduce forceFlush to unconditionally flush any remaining buffered data. Use the new flush strategy in storeFast (no forced immediate flush) and replace the final flush in close() with forceFlush to ensure all data is written on close.	`3rdparty/pzip/src/fast_deflate.cpp` `3rdparty/pzip/include/pzip/fast_deflate.h`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In writeCentralDirectory, needZip64 is computed with entry.localHeaderOffset >= ZIP_UINT32_MAX but the actual central directory local header offset is only clamped when entry.localHeaderOffset > ZIP_UINT32_MAX; to avoid inconsistent signalling of ZIP64 usage around the exact 0xFFFFFFFF boundary, consider using the same >= ZIP_UINT32_MAX condition in both places.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In writeCentralDirectory, `needZip64` is computed with `entry.localHeaderOffset >= ZIP_UINT32_MAX` but the actual central directory local header offset is only clamped when `entry.localHeaderOffset > ZIP_UINT32_MAX`; to avoid inconsistent signalling of ZIP64 usage around the exact 0xFFFFFFFF boundary, consider using the same `>= ZIP_UINT32_MAX` condition in both places.

## Individual Comments

### Comment 1
<location> `3rdparty/pzip/src/zip_writer.cpp:393-379` </location>
<code_context>

-        // Extra field length
-        write16(static_cast<uint16_t>(h.extra.size()));
+        // 构建 ZIP64 extra 字段
+        std::vector<uint8_t> zip64Extra;
+        if (needZip64) {
+            // 28 bytes: 2x uint16 + 3x uint64
+            // Header ID
</code_context>

<issue_to_address>
**issue (bug_risk):** Building a new ZIP64 extra block may conflict with an existing ZIP64 extra in `h.extra`.

When `needZip64` is true, a new ZIP64 extra field is always appended without checking whether `h.extra` already includes one. If callers pre-populate a ZIP64 extra, this can create duplicate or conflicting entries. Consider either updating an existing ZIP64 extra if present, or clearly documenting that callers must not provide ZIP64 extra data and that this writer is the sole owner of it.
</issue_to_address>

### Comment 2
<location> `3rdparty/pzip/src/zip_writer.cpp:417-418` </location>
<code_context>
+            }
+        }
+        
+        // Extra field length (原有 extra + ZIP64 extra)
+        size_t extraLen = h.extra.size() + zip64Extra.size();
+        write16(static_cast<uint16_t>(extraLen));

</code_context>

<issue_to_address>
**issue (bug_risk):** Potential truncation if combined extra field length exceeds `UINT16_MAX`.

`extraLen` is computed as `size_t` and then cast to `uint16_t` in `write16(static_cast<uint16_t>(extraLen));`. If `h.extra.size() + zip64Extra.size()` exceeds `0xFFFF`, this will be truncated and produce an invalid ZIP central directory entry. Please add a check for this condition and either fail with a clear error or otherwise handle the overflow explicitly.
</issue_to_address>

### Comment 3
<location> `3rdparty/pzip/src/fast_deflate.cpp:1174-1175` </location>
<code_context>

-// 将 writer_ 中的数据刷新到输出目标
+// 将 writer_ 中的数据刷新到输出目标（当缓冲区足够大时）
 void FlateWriter::flushOutput() {
+    auto& buf = writer_->data();
+    // 只在缓冲区超过 256KB 时才刷新，减少系统调用次数
+    if (buf.size() >= 256 * 1024 && output_) {
</code_context>

<issue_to_address>
**question (bug_risk):** `flushOutput` now only flushes on large buffers, which changes its semantics and may surprise callers.

This optimization reduces syscalls but changes the meaning of "flush" and may increase peak memory or delay data delivery. Since `flushOutput` is used from `storeFast()` and `forceFlush()` is used from `close()`, this is likely intentional, but please (a) confirm no callers depend on `flushOutput` flushing immediately, or (b) consider renaming one of the methods to distinguish "soft" vs "hard" flush semantics more clearly.
</issue_to_address>

### Comment 4
<location> `3rdparty/pzip/src/zip_writer.cpp:197` </location>
<code_context>
 }

 Error ZipWriter::writeDataDescriptor(const ZipFileHeader& header) {
+    // 对应 Go archive/zip 的 writeDataDescriptor
+    // 如果 isZip64()，使用 24 字节（4+4+8+8），否则使用 16 字节（4+4+4+4）
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting the repeated little-endian writing and ZIP64 record/extra-field construction into shared helper functions to simplify these ZIP writer methods.

You can reduce the added complexity without changing behavior by extracting the repetitive little‑endian writing and ZIP64 building logic into small helpers, then using them in `writeCentralDirectory` and `writeEndOfCentralDirectory`.

### 1. Centralize little‑endian writers

Right now, each of `writeLocalFileHeader`, `writeDataDescriptor`, `writeCentralDirectory`, and `writeEndOfCentralDirectory` defines its own `write16`/`write32`/`write64` lambdas.

You can keep all behavior and simplify by introducing reusable helpers:

```cpp
// In an internal namespace / anonymous namespace in the .cpp

inline void appendLE16(std::vector<uint8_t>& buf, uint16_t v) {
    buf.push_back(static_cast<uint8_t>(v));
    buf.push_back(static_cast<uint8_t>(v >> 8));
}

inline void appendLE32(std::vector<uint8_t>& buf, uint32_t v) {
    appendLE16(buf, static_cast<uint16_t>(v));
    appendLE16(buf, static_cast<uint16_t>(v >> 16));
}

inline void appendLE64(std::vector<uint8_t>& buf, uint64_t v) {
    appendLE32(buf, static_cast<uint32_t>(v));
    appendLE32(buf, static_cast<uint32_t>(v >> 32));
}
```

Then e.g. in `writeDataDescriptor`:

```cpp
Error ZipWriter::writeDataDescriptor(const ZipFileHeader& header) {
    std::vector<uint8_t> buf;
    buf.reserve(header.isZip64() ? 24 : 16);

    appendLE32(buf, DATA_DESCRIPTOR_SIG);
    appendLE32(buf, header.crc32);

    if (header.isZip64()) {
        appendLE64(buf, header.compressedSize);
        appendLE64(buf, header.uncompressedSize);
    } else {
        appendLE32(buf, static_cast<uint32_t>(header.compressedSize));
        appendLE32(buf, static_cast<uint32_t>(header.uncompressedSize));
    }

    file_.write(reinterpret_cast<const char*>(buf.data()), buf.size());
    if (!file_.good()) {
        return Error(ErrorCode::FILE_WRITE_ERROR, "Failed to write data descriptor");
    }
    currentOffset_ += buf.size();
    return Error();
}
```

Same pattern can replace the lambdas in `writeLocalFileHeader`, `writeCentralDirectory`, and `writeEndOfCentralDirectory`.

### 2. Factor ZIP64 extra field construction

The ZIP64 extra field logic in `writeCentralDirectory` currently has three manual `for (int i = 0; i < 8; ++i)` loops. That can be encapsulated to make the loop more declarative:

```cpp
inline void buildZip64Extra(const ZipFileHeader& h,
                            uint64_t localHeaderOffset,
                            std::vector<uint8_t>& zip64Extra) {
    zip64Extra.clear();
    zip64Extra.reserve(28); // 2 + 2 + 3 * 8

    // Header ID
    appendLE16(zip64Extra, EXTRA_ID_ZIP64);
    // Data size = 24 (3x uint64)
    appendLE16(zip64Extra, 24);

    appendLE64(zip64Extra, h.uncompressedSize);
    appendLE64(zip64Extra, h.compressedSize);
    appendLE64(zip64Extra, localHeaderOffset);
}
```

Then `writeCentralDirectory` becomes simpler:

```cpp
for (auto& entry : centralDir_) {
    auto& h = entry.header;
    buf.clear();

    bool needZip64 = h.isZip64() || entry.localHeaderOffset >= ZIP_UINT32_MAX;

    appendLE32(buf, CENTRAL_DIR_HEADER_SIG);
    appendLE16(buf, h.versionMadeBy);
    appendLE16(buf, needZip64 ? ZIP_VERSION_45 : h.versionNeeded);
    appendLE16(buf, h.flags);
    appendLE16(buf, h.method);
    appendLE16(buf, h.modTime);
    appendLE16(buf, h.modDate);
    appendLE32(buf, h.crc32);

    if (needZip64) {
        appendLE32(buf, ZIP_UINT32_MAX);
        appendLE32(buf, ZIP_UINT32_MAX);
    } else {
        appendLE32(buf, static_cast<uint32_t>(h.compressedSize));
        appendLE32(buf, static_cast<uint32_t>(h.uncompressedSize));
    }

    appendLE16(buf, static_cast<uint16_t>(h.name.size()));

    std::vector<uint8_t> zip64Extra;
    if (needZip64) {
        buildZip64Extra(h, entry.localHeaderOffset, zip64Extra);
    }

    size_t extraLen = h.extra.size() + zip64Extra.size();
    appendLE16(buf, static_cast<uint16_t>(extraLen));
    appendLE16(buf, 0); // comment length
    appendLE16(buf, 0); // disk number
    appendLE16(buf, 0); // internal attr
    appendLE32(buf, h.externalAttr);

    appendLE32(buf, entry.localHeaderOffset > ZIP_UINT32_MAX
                        ? ZIP_UINT32_MAX
                        : static_cast<uint32_t>(entry.localHeaderOffset));

    // fixed part
    file_.write(reinterpret_cast<const char*>(buf.data()), buf.size());
    currentOffset_ += buf.size();

    // name + original extra + zip64 extra
    file_.write(h.name.c_str(), h.name.size());
    currentOffset_ += h.name.size();

    if (!h.extra.empty()) {
        file_.write(reinterpret_cast<const char*>(h.extra.data()), h.extra.size());
        currentOffset_ += h.extra.size();
    }
    if (!zip64Extra.empty()) {
        file_.write(reinterpret_cast<const char*>(zip64Extra.data()), zip64Extra.size());
        currentOffset_ += zip64Extra.size();
    }
}
```

This keeps the logic the same but removes the low‑level byte loops from the core flow.

### 3. Split EOCD writing into small helpers

`writeEndOfCentralDirectory` currently builds ZIP64 EOCD, ZIP64 locator, and classic EOCD in one function with local `write16`/`write32`/`write64`. You can keep the signature and behavior but factor out the record builders:

```cpp
inline void buildZip64Eocd(uint64_t records,
                           uint64_t size,
                           uint64_t offset,
                           std::vector<uint8_t>& buf) {
    buf.clear();
    buf.reserve(56); // directory64EndLen

    appendLE32(buf, ZIP64_END_OF_CENTRAL_DIR_SIG);
    appendLE64(buf, 44); // size of remaining record
    appendLE16(buf, ZIP_VERSION_45); // version made by
    appendLE16(buf, ZIP_VERSION_45); // version needed
    appendLE32(buf, 0); // disk number
    appendLE32(buf, 0); // disk with start
    appendLE64(buf, records);
    appendLE64(buf, records);
    appendLE64(buf, size);
    appendLE64(buf, offset);
}

inline void buildZip64Locator(uint64_t zip64EocdOffset,
                              std::vector<uint8_t>& buf) {
    // append to existing buf
    appendLE32(buf, ZIP64_END_OF_CENTRAL_DIR_LOCATOR_SIG);
    appendLE32(buf, 0);              // disk with start of zip64 EOCD
    appendLE64(buf, zip64EocdOffset);
    appendLE32(buf, 1);              // total number of disks
}

inline void buildClassicEocd(uint64_t records,
                             uint64_t size,
                             uint64_t offset,
                             const std::string& comment,
                             std::vector<uint8_t>& buf) {
    buf.clear();
    buf.reserve(22 + comment.size());

    appendLE32(buf, END_OF_CENTRAL_DIR_SIG);
    appendLE16(buf, 0); // this disk
    appendLE16(buf, 0); // disk with start
    appendLE16(buf, static_cast<uint16_t>(records));
    appendLE16(buf, static_cast<uint16_t>(records));
    appendLE32(buf, static_cast<uint32_t>(size));
    appendLE32(buf, static_cast<uint32_t>(offset));
    appendLE16(buf, static_cast<uint16_t>(comment.size()));
}
```

Use them in `writeEndOfCentralDirectory`:

```cpp
Error ZipWriter::writeEndOfCentralDirectory(uint64_t centralDirOffset,
                                            uint64_t centralDirSize) {
    uint64_t records = centralDir_.size();
    uint64_t size = centralDirSize;
    uint64_t offset = centralDirOffset;

    std::vector<uint8_t> buf;

    bool needZip64 =
        (records >= ZIP_UINT16_MAX ||
         size    >= ZIP_UINT32_MAX ||
         offset  >= ZIP_UINT32_MAX);

    if (needZip64) {
        buildZip64Eocd(records, size, offset, buf);
        buildZip64Locator(currentOffset_, buf); // offset of zip64 EOCD

        file_.write(reinterpret_cast<const char*>(buf.data()), buf.size());
        currentOffset_ += buf.size();

        records = ZIP_UINT16_MAX;
        size    = ZIP_UINT32_MAX;
        offset  = ZIP_UINT32_MAX;
    }

    buildClassicEocd(records, size, offset, comment_, buf);
    file_.write(reinterpret_cast<const char*>(buf.data()), buf.size());

    if (!comment_.empty()) {
        file_.write(comment_.c_str(), comment_.size());
    }

    if (!file_.good()) {
        return Error(ErrorCode::FILE_WRITE_ERROR,
                     "Failed to write end of central directory");
    }
    currentOffset_ += buf.size() + comment_.size();
    return Error();
}
```

This keeps the ZIP64 behavior intact but makes `writeEndOfCentralDirectory` mostly high‑level decisions, with low‑level packing isolated in short helpers.

Overall, these small extractions should address the “too complex / too repetitive” feedback while preserving all current functionality and ZIP64 behavior.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

3rdparty/pzip/src/zip_writer.cpp

3rdparty/pzip/src/fast_deflate.cpp

deepin-ci-robot · 2026-01-15T12:38:49Z

deepin pr auto review

Git Diff 代码审查报告

整体评价

这段代码主要是为 ZIP 文件写入器添加 ZIP64 支持，以处理大于 4GB 的文件。代码参考了 Go 语言标准库 archive/zip 的实现，整体结构清晰，逻辑正确。以下是详细的审查意见：

1. 语法和逻辑审查

1.1 fast_deflate.h/cpp

优点：

新增 forceFlush() 方法与 flushOutput() 分离，逻辑清晰
flushOutput() 现在只在缓冲区足够大时刷新，减少系统调用

改进建议：

flushOutput() 中的 256KB 阈值是硬编码的，建议定义为常量：
```
constexpr size_t FLUSH_THRESHOLD = 256 * 1024;
```
在 storeFast() 中调用 flushOutput() 的注释可以更明确：
```
flushOutput();  // 仅当缓冲区足够大时刷新
```

1.2 file_task.h

优点：

ZIP64 相关常量定义清晰
isZip64() 方法实现正确

改进建议：

考虑将 ZIP64 相关常量移到一个专门的命名空间或结构体中，避免全局命名空间污染：

namespace ZipConstants {
    constexpr uint32_t UINT32_MAX = 0xFFFFFFFF;
    constexpr uint16_t UINT16_MAX = 0xFFFF;
    constexpr uint16_t VERSION_45 = 45;
}

1.3 zip_writer.h/cpp

优点：

ZIP64 实现遵循 ZIP 规范
代码参考了 Go 标准库的实现，逻辑可靠
使用缓冲区批量写入，减少 I/O 操作

改进建议：

writeEndOfCentralDirectory 的参数可以封装为一个结构体：

struct CentralDirectoryInfo {
    uint64_t offset;
    uint64_t size;
    uint64_t records;
};
Error writeEndOfCentralDirectory(const CentralDirectoryInfo& info);

在 writeCentralDirectory() 中，needZip64 的判断可以提取为独立方法：

bool ZipWriter::needsZip64Entry(const CentralDirEntry& entry) const {
    return entry.header.isZip64() || entry.localHeaderOffset >= ZIP_UINT32_MAX;
}

ZIP64 extra 字段的构建可以封装为独立方法：

std::vector<uint8_t> ZipWriter::buildZip64ExtraField(const ZipFileHeader& header, uint64_t offset) const {
    // 实现细节...
}

2. 代码质量

优点：

代码结构清晰，职责分明
有良好的注释，特别是标明了与 Go 标准库的对应关系
使用了现代 C++ 特性（如 lambda 表达式）

改进建议：

部分注释可以更详细，例如解释为什么 ZIP64 需要版本 4.5
考虑添加单元测试，特别是边界情况（如正好 4GB 的文件）
错误处理可以更详细，例如返回具体的错误原因而不是通用的错误

3. 代码性能

优点：

使用缓冲区批量写入，减少系统调用
flushOutput() 的优化减少了不必要的 I/O 操作

改进建议：

在 writeCentralDirectory() 中，可以考虑预先计算所有条目需要的缓冲区大小，避免多次重新分配：

size_t totalSize = 0;
for (const auto& entry : centralDir_) {
    totalSize += 46 + entry.header.name.size() + entry.header.extra.size();
    if (needsZip64Entry(entry)) {
        totalSize += 28; // ZIP64 extra field size
    }
}
buf.reserve(totalSize);

对于频繁调用的写入函数（如 write16, write32），可以考虑内联：

inline void write16(std::vector<uint8_t>& buf, uint16_t v) {
    buf.push_back(v & 0xFF);
    buf.push_back((v >> 8) & 0xFF);
}

4. 代码安全

优点：

使用了类型安全的转换
对大文件有正确的处理

改进建议：

检查文件大小是否超过 ZIP64 的限制（虽然这个限制非常大，约 16EB）
在写入前检查文件是否可写，避免写入过程中失败
考虑添加校验和验证，确保写入的数据正确

在 writeEndOfCentralDirectory 中，可以添加对 centralDirOffset 和 centralDirSize 的合理性检查：

if (centralDirOffset > currentOffset_) {
    return Error("Invalid central directory offset");
}
if (centralDirSize > currentOffset_ - centralDirOffset) {
    return Error("Invalid central directory size");
}

5. 其他建议

考虑添加对 ZIP64 扩展时间戳的支持
可以添加对加密文件的支持（如果需要）
考虑添加对多卷 ZIP 文件的支持
可以添加对 ZIP 文件注释的支持（虽然代码中已有 comment_ 成员，但没有看到使用）

总结

这段代码整体质量良好，正确实现了 ZIP64 支持。主要的改进空间在于：

提高代码的可维护性（提取重复代码，封装复杂逻辑）
提高代码的性能（优化缓冲区使用）
增强代码的安全性（添加边界检查）

建议在合并前进行充分的测试，特别是针对大文件和边界情况的测试。

deepin-ci-robot · 2026-01-15T12:51:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dengzhongyuan365-dev, lzwind

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

debian/deepin/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dengzhongyuan365-dev · 2026-01-15T12:52:29Z

/forcemerge

dengzhongyuan365-dev added 2 commits January 15, 2026 19:55

sourcery-ai bot reviewed Jan 15, 2026

View reviewed changes

3rdparty/pzip/src/zip_writer.cpp Show resolved Hide resolved

3rdparty/pzip/src/fast_deflate.cpp Show resolved Hide resolved

lzwind approved these changes Jan 15, 2026

View reviewed changes

deepin-bot bot merged commit a011595 into linuxdeepin:develop/snipe Jan 15, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement ZIP64 support in pzip for large file handling#345

feat: Implement ZIP64 support in pzip for large file handling#345
deepin-bot[bot] merged 2 commits intolinuxdeepin:develop/snipefrom
dengzhongyuan365-dev:pzip-cpp

dengzhongyuan365-dev commented Jan 15, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jan 15, 2026

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

deepin-ci-robot commented Jan 15, 2026

Uh oh!

deepin-ci-robot commented Jan 15, 2026

Uh oh!

dengzhongyuan365-dev commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dengzhongyuan365-dev commented Jan 15, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 15, 2026

Reviewer's Guide

Sequence diagram for ZipWriter close with ZIP64 central directory and EOCD handling

Sequence diagram for FlateWriter output buffering and flushing

Class diagram for updated ZIP64-related structures and writers

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

deepin-ci-robot commented Jan 15, 2026

deepin pr auto review

Git Diff 代码审查报告

整体评价

1. 语法和逻辑审查

1.1 fast_deflate.h/cpp

1.2 file_task.h

1.3 zip_writer.h/cpp

2. 代码质量

3. 代码性能

4. 代码安全

5. 其他建议

总结

Uh oh!

deepin-ci-robot commented Jan 15, 2026

Uh oh!

dengzhongyuan365-dev commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dengzhongyuan365-dev commented Jan 15, 2026 •

edited by sourcery-ai bot

Loading